19 research outputs found

    Jetstream: A national science & engineering cloud. Globus World Lightning Talk.

    Get PDF
    Presentation highlighting the capabilities and purpose for the upcoming Jetstream system.Jetstream and related material to its operation are funded by the National Science Foundation under grant No. ACI-1445604

    Repository of NSF Funded Publications and Data Sets: "Back of Envelope" 15 year Cost Estimate

    Get PDF
    In this back of envelope study we calculate the 15 year fixed and variable costs of setting up and running a data repository (or database) to store and serve the publications and datasets derived from research funded by the National Science Foundation (NSF). Costs are computed on a yearly basis using a fixed estimate of the number of papers that are published each year that list NSF as their funding agency. We assume each paper has one dataset and estimate the size of that dataset based on experience. By our estimates, the number of papers generated each year is 64,340. The average dataset size over all seven directorates of NSF is 32 gigabytes (GB). A total amount of data added to the repository is two petabytes (PB) per year, or 30 PB over 15 years. The architecture of the data/paper repository is based on a hierarchical storage model that uses a combination of fast disk for rapid access and tape for high reliability and cost efficient long-term storage. Data are ingested through workflows that are used in university institutional repositories, which add metadata and ensure data integrity. Average fixed costs is approximately .0.90/GBover15yearspan.Variablecostsareestimatedataslidingscaleof.0.90/GB over 15-year span. Variable costs are estimated at a sliding scale of 150 - 100pernewdatasetforupfrontcuration,or100 per new dataset for up-front curation, or 4.87 – 3.22perGB.Variablecostsreflecta3Thetotalprojectedcostofthedataandpaperrepositoryisestimatedat3.22 per GB. Variable costs reflect a 3% annual decrease in curation costs as efficiency and automated metadata and provenance capture are anticipated to help reduce what are now largely manual curation efforts. The total projected cost of the data and paper repository is estimated at 167,000,000 over 15 years of operation, curating close to one million of datasets and one million papers. After 15 years and 30 PB of data accumulated and curated, we estimate the cost per gigabyte at 5.56.This5.56. This 167 million cost is a direct cost in that it does not include federally allowable indirect costs return (ICR). After 15 years, it is reasonable to assume that some datasets will be compressed and rarely accessed. Others may be deemed no longer valuable, e.g., because they are replaced by more accurate results. Therefore, at some point the data growth in the repository will need to be adjusted by use of strategic preservation

    Implementing a Data Publishing Service via DSpace

    Get PDF
    4th International Conference on Open RepositoriesThis presentation was part of the session : DSpace User Group PresentationsDate: 2009-05-20 01:30 PM – 03:00 PMThe Indiana University Libraries and Digital Library Program offer a set of online scholarly communication services to IU scholars under the brand IUScholarWorks. Currently, these services include IUScholarWorks Repository, a DSpace-based institutional repository for dissemination and preservation of articles, papers, technical reports, and other scholarly products, and IUScholarWorks Journals, an Open Journal System-based online journal hosting service. To complement these two existing services, the Libraries and Digital Library Program are collaborating with the Research Technologies division of IU's central IT organization to implement a research data publishing service as a new feature of IUScholarWorks Repository. The idea of this service is to allow researchers to easily publish their datasets for online access at a stable web address, reference these datasets from publications, and assume at least bit-level preservation of the data. The intent is to develop a service that is generic enough to be used for everything from sensor data to statistical data to ethnographic field video. This service will leverage IU's existing Massive Data Storage System, which is an existing large scale centrally-funded distributed storage service offered by Research Technologies to IU faculty, staff, and graduate students for storage of their research data. Based on the consortium-developed High Performance Storage System (HPSS) software, MDSS offers over 2.8 petabytes of disk- and tape-based storage distributed between IU's Bloomington and Indianapolis campuses and supports replication of data between these two sites. Data may be transferred in and out of MDSS using a variety of interfaces, including SFTP, Parallel FTP, GridFTP, HSI, SMB/CIFS, and a simple Web-based user interface. We intend to initially support two data publishing scenarios: One in which a researcher submits a dataset by entering minimal metadata and uploading data files through DSpace's Configurable Submission Interface (which are then automatically placed in MDSS if they are over a specified filesize), and the other in which the researcher indicates as part of the submission process that the data to be published already resides in a personal or research group account in MDSS and should be copied into an IUScholarWorks-managed area of MDSS for availability through DSpace. In this presentation, we will discuss our conception of the service, its technical architecture and design, metadata requirements, and progress on implementation. We will also discuss the potential applicability of our approach and implementation to others who are interested in implementing similar services

    Repository of NSF-funded Publications and Related Datasets: “Back of Envelope” Cost Estimate for 15 years

    Get PDF
    In this back of envelope study we calculate the 15-year fixed and variable costs of setting up and running a data repository (or database) to store and serve the publications and datasets derived from research funded by the National Science Foundation (NSF). Costs are computed on a yearly basis using a fixed estimate of the number of papers that are published each year that list NSF as their funding agency. We assume each paper has one dataset and estimate the size of that dataset based on experience. By our estimates, the number of papers generated each year is 64,340. The average dataset size over all seven directorates of NSF is 32 gigabytes (GB). A total amount of data added to the repository is two petabytes (PB) per year, or 30 PB over 15 years. The architecture of the data/paper repository is based on a hierarchical storage model that uses a combination of fast disk for rapid access and tape for high reliability and cost efficient long-term storage. Data are ingested through workflows that are used in university institutional repositories, which add metadata and ensure data integrity. Average fixed costs is approximately 0.90 cents per GB over a 15-year span. Variable costs are estimated at a sliding scale of 150-100 dollars per new dataset for up-front curation, or 4.87-3.22 dollars per GB. Variable costs reflect a 3% annual decrease in curation costs as efficiency and automated metadata and provenance capture are anticipated to help reduce what are now largely manual curation efforts. The total projected cost of the data and paper repository is estimated at 167,000,000 dollars over 15 years of operation, curating close to one million of datasets and one million papers. After 15 years and 30 PB of data accumulated and curated, we estimate the cost per gigabyte at 5.56 dollars. This $167 million cost is a direct cost in that it does not include federally allowable indirect costs return (ICR). After 15 years, it is reasonable to assume that some datasets will be compressed and rarely accessed. Others may be deemed no longer valuable, e.g., because they are replaced by more accurate results. Therefore, at some point the data growth in the repository will need to be adjusted by use of strategic preservation

    Conducting K-12 Outreach to Evoke Early Interest in IT, Science, and Advanced Technology

    Get PDF
    This is a preprint of a paper presented at XSEDE '12: The 1st Conference of the Extreme Science and Engineering Discovery Environment, Chicago, Illinois.The Indiana University Pervasive Technology Institute has engaged for several years in K-12 Education, Outreach and Training (EOT) events related to technology in general and computing in particular. In each event we strive to positively influence children’s perception of science and technology. We view K-12 EOT as a channel for technical professionals to engage young people in the pursuit of scientific and technical understanding. Our goal is for students to see these subjects as interesting, exciting, and worth further pursuit. By providing opportunities for pre-college students to engage in science, technology, engineering and mathematics (STEM) activities first hand, we hope to influence their choices of careers and field-of-study later in life. In this paper we give an account of our experiences with providing EOT: we describe several of our workshops and events; we provide details regarding techniques that we found to be successful in working with both students and instructors; we discuss program costs and logistics; and we describe our plans for the future.This material is based upon work supported by the National Science Foundation under Grant No. OCI-0503697. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

    Usage of Indiana University computation and data cyberinfrastructure in FY 2011/2012 and assessment of future needs

    Get PDF
    This report details the past and current cyberinfrastructure resources that have been deployed by the Research Technologies (RT) division of University Information Technologies Services to support research and scholarly activities at IU. This report also presents data and detailed analysis of system usage and services supported by RT for the FY 2011/2012 period, projects future usage trends based on these data, and provides several recommendations for the most effective ways to meet the growing need for high performance computing resources in research and scholarly endeavors.This research was supported in part by: The Pervasive Technology Institute, Indiana Metabolomics and Cytomics Initiative, and the Indiana Genomics Initiative. All of these initiatives have been supported in part by Lilly Endowment, Inc. Grant number 1U24AA014818-01 from NIAAA/NIH. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIAAA/NIH. National Science Foundation under Grants CDA-9601632, EIA-0116050, ACI-0338618l, OCI-0451237, OCI-0535258, and OCI-0504075, CNS-0723054, and CNS-0521433. Shared University Research grants from IBM, Inc. to Indiana University. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies represented above

    Indiana University Pervasive Technology Institute – Research Technologies: XSEDE Service Provider and XSEDE subcontract report (PY1: 1 July 2011 to 30 June 2012)

    Get PDF
    Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF or XSEDE leadership.This document is a summary of the activities of the Research Technologies division of UITS, a Service & Cyberinfrastructure Center affiliated with the Indiana University Pervasive Technology Institute, as part of the eXtreme Science and Engineering Discovery Environment (XSEDE) during XSEDE Program Year 1 (1 July 2011 – 30 June 2012). This document consists of three parts: - Section 2 of this document describes IU’s activities as an XSEDE Service Provider, using the format prescribed by XSEDE for reporting such activities. - Section 3 of this document describes IU’s activities as part of XSEDE management, operations, and support activities funded under a subcontract from the National Center for Supercomputer Applications (NCSA), the lead organization for XSEDE. This section is organized by the XSEDE Work Breakdown Structure (WBS) plan. - Appendix 1 is a summary table of IU’s education, outreach, and training events funded and supported in whole or in part by IU’s subcontract from NCSA as part of XSEDE.This document was developed with support from National Science Foundation (NSF) grant OCI-1053575

    Indiana University's Advanced Cyberinfrastructure

    Get PDF
    This is an archived document. The most current version may be found at http://pti.iu.edu/ciThe purpose of this document is to introduce researchers to Indiana University’s cyberinfrastructure – to clarify what these facilities make possible, to discuss how to use them and the professional staff available to work with you. The resources described here are complex and varied, among the most advanced in the world. The intended audience is anyone unfamiliar with IU’s cyberinfrastructure

    2012 Annual Report - Advanced Biomedical Information Technology Core

    Get PDF
    This material is based upon work supported in part by the following funding agencies and grant awards: • Lilly Endowment, for its support of the Indiana Genomics Initiative (INGEN) – 2000; Indiana Metabolomics and Cytomics Initiative (METACyt); Indiana Pervasive Computing Research (IPCRES) initiative and Pervasive Technology Institute (1999 and 2008 respectively) • National Science Foundation under grants 01116050 MRI: Creation of the AVIDD Data Facility: A Distributed Facility for Managing, Analyzing and Visualizing Instrument-Driven Data (Michael A. McRobbie, PI); 0521433 MRI: Acquisition of a High-Speed, High Capacity Storage System to Support Scientific Computing: The Data Capacitor (Craig A. Stewart, PI); 0521433 ABI Development: National Center for Genome Analysis Support (Craig A. Stewart, PI) • National Institutes of Health NIAAA awards U24 AA014818-01 (Craig A. Stewart, PI) and U24 AA014818-04 (William K. Barnett, PI) Informatics Core for the Collaborative Initiative on Fetal Alcohol Spectrum Disorder • Subcontracts through the following NIH grant awards: 5P40RR024928 (Kenneth Cornetta, PI), 2U01AA014809 (Tatiana Foroud, PI), 1DP2OD007363-01 (Alexander Niculescu, PI), UL1RR025761-01 (Anantha Shekhar, PI), 3UL1RR025761-04S2 (Anantha Shekhar, PI), and 3UL1RR025761-04S3 (Anantha Shekhar, PI) • Funding from the general funds of Indiana University Any opinions expressed in this document are those of the authors and do not necessarily reflect the views of the funding agencies above

    Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network

    No full text
    Presented at BioITWorld Cloud Summit September 11-13, 2012.Managing the profusion and accumulated volumes of life-science data is cumbersome; transferring them can require anything from shipping a hard drive to paying a graduate student to babysit transfers. Indiana University’s Data Capacitor solves this problem by exporting a high-performance Lustre file system across wide area networks to multiple locations. A mounted file system lets researchers run simple and familiar commands without having to contend with special tools for data transfer. Moreover, multiple mounts let researchers compute against their data from anywhere. To meet the insatiable bandwidth demands of life scientists, network infrastructure providers are increasingly offering 100 Gigabit circuits. IU recently used Lustre across a 100 Gigabit network spanning 2,300 miles to demonstrate application performance across a great distance. This presentation will describe the Data Capacitor cyber infrastructure and associated work, explore future use cases applicable to bioinformatics, and explain how the National Center for Genome Analysis Support (NCGAS) at Indiana University intends to integrate the Data Capacitor into their workflows
    corecore